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Abstract. The border rank of the matrix multiplication operator for n x n matrices is a stan- 
dard measure of its complexity. Using techniques from algebraic geometry and representation 
theory, we show the border rank is at least 2n 2 — n. Our bounds are better than the previous 
lower bound (due to Lickteig in 1985) of |n 2 + § — 1 for all n > 3. The bounds are obtained by 
finding new equations that bilinear maps of small border rank must satisfy, i.e., new equations 
for secant varieties of triple Segre products, that matrix multiplication fails to satisfy. 



1. Introduction and statement of results 

Finding lower bounds in complexity theory is considered difficult. For example, chapter 14 
of [1] is "Circuit lower bounds: Complexity theory's Waterloo". The complexity of matrix 
multiplication is roughly equivalent to the complexity of many standard operations in linear 
algebra, such as taking the determinant or inverse of a matrix. A standard measure of the 
complexity of an operation is the minimal length of a straight line program (or circuit) needed 
to perform it. Another measure is just to count the number of multiplications performed. The 
exponent of matrix multiplication ui is defined to be hm n log n of the arithmetic cost to multiply 
n x n matrices, or equivalently, lim n log n of the minimal number of multiplications needed. (The 
result that these are equivalent justifies ignoring additions.) Determining the complexity of 
matrix multiplication is a central question of practical importance. We give new lower bounds 
for its complexity in terms of border rank. 

The rank one bilinear maps are those that can be executed using just one scalar multiplication. 
The rank of a bilinear map T is the smallest r such that T can be written as a sum of r rank 
one bilinear maps. In other words, let A,B,C be vector spaces, with dual spaces A*,B*,C*, 
and let T : A* x B* — > C be a bilinear map. Then the rank of T is the smallest r such that 
there exist a±, . . . ,a r G A, bi, . . . , b r 6 B, ci, . . . , c r £ C such that T(a, j3) = Yll=i a i( a )bi(P)ci- 
The border rank of T is the smallest r such that T can be written as a limit of a sequence of 
bilinear maps of rank r. Let R(T) denote the border rank of T. 

Let M/ miI1) j\ : Mat mxn x Mat nx \ — > Mat mx \ denote the matrix multiplication operator. One 
has (see, e.g., [4, Props. 15.1, 15.5, 15.10]) that u = lim n (log n R(M/ ninn \ ) ) . For more on the 
relation between border rank and other measures of complexity, see [4]. Naively K,(M/ m „ \\) < 
mnl via the standard algorithm. In 1969, V. Strassen [13] showed that R(M( 2i 2,2}) — 7 and, 
as a consequence, R(M^ n n n ^) < C(n 2,81 ). Further upper bounds have been derived since 
then by numerous authors, with the current record M M (n,n,n)) < C(n 2 - 3727 ) [15]. In 1983 
Strassen showed [12] that "R.(M/„nn)) > | n2 > an d shortly thereafter T. Lickteig [8] showed 
R(M/r, „„\ ) > |n 2 + t| — 1. Since then no further general lower bound had been found (although 
it is now known R(M( 2j 2,2}) = 7, see [5]). 

Our results are as follows: 
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Theorem 1.1. Let n < m. 

nl (n + m — 1) 



(1) R(M (m , n>1> ) > 



m 



Corollary 1.2. 

(2) R(M (njn>1) )>2nl-l 

(3) E(M (n , nin> ) > 2n 2 -n. 

Thus for 3 x 3 matrices, the state of the art is 15 < R(-^{3,3,3}) ^ 21, the upper bound is due 
to Schonhage [11]. 

Our results include other bounds that might be better asymptotically. For example: 
Theorem 1.3. Set p < — 1 and assume n < m. Then 

](mn\ i S^P~™(_i\j ( mn Wm+j-lWm+n+j\ 

(4) E (M (m , nJ) ) > pd } - Ej = o( 1 )Uj+i) - 

In the case m = n, set q = 2 • [" "^V 1 " 1 ^ ] — 1, and set p = |"|] — 1. Then 

n 2 $ n^C-iyU-^rDCSSi) 



(5) E(M <W> ) > 



C; 1 ) (V) 



Remark 1.4. For small values, Theorem 1.1 gives better bounds, but it may be the case that 
asymptotically the bounds in Theorem 1.3 are better, although this does not appear to be the 
case up to n = 270. Independent of matrix multiplication, determining the limiting values gives 
rise to interesting questions in asymptotic representation theory. 

Remark 1.5. The best lower bounds for the rank of matrix multiplication are R(M/ n|m n ) > 
lm + mn + 1 - m + n - 3, R(M (n n]1) ) > 21n - 1 + 2n - 2, and R(M( njI1)n ) ) > |n 2 - 3n. These 
are all due to to Blaser, the first two are in [3], and the third in [2]. 

Our bounds come from explicit equations that bilinear maps of low border rank must satisfy. 
These equations are best expressed in the language of tensors. Our method is similar in nature to 
the method used by Strassen to get his lower bounds - we find explicit polynomials that tensors of 
low border rank must satisfy, and show that matrix multiplication fails to satisfy them. Strassen 
found his equations via linear algebra - taking the commutator of certain matrices. We found 
ours using representation theory and algebraic geometry. (Algebraic geometry is not needed for 
presenting the results. For its role in our method see [7].) More precisely, in §3 we define, for 
every p, a linear map 

and we prove that R(M^ m n ^) > ( m " _1 )rank [(A^( m ,n,l))A P ] • We then compute the rank of 
the linear map (M( m>n The above-mentioned equations are the minors of the linear map 
(M^ m n l ^)^ p . This is done with the help of representation theory - we explicitly describe the 
kernel as a sum of irreducible representations labeled by certain Young diagrams. Equation (4) 
is obtained in §4 by expressing the kernel of (M/ m n) i\)^ p as the last term of an exact sequence, 
and then computing the alternating sum of the dimensions of the spaces involved, and the proof 
of (5) is similar. 
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Remark 1.6. Viewing matrix multiplication as a map C 3n — > C, it is generally expected in the 
computer science community to have a lower bound on the border rank asymptotically like 3n 2 , 
the "input size". On the other hand it is also conjectured that Ti.(M/„nn\) grows like C(n 2 ). 

A truly significant lower bound would be a function that grew like 3n 2 /i(n) where h is an 
increasing function. No such super-linear lower bound on the complexity of any explicit tensor 
(or any computational problem) is known, see [1, 14]. 

^From a mathematician's perspective, all known equations for secant varieties of Segre vari- 
eties that have a geometric model arise by translating multi-linear algebra to linear algebra, and 
it appears that the limit of this technique is roughly the input size. 

Remark 1.7. The methods used here should be applicable to lower bound problems coming from 
the Geometric Complexity Theory (GCT) introduced by Mulmuley and Sohoni [9], in particular 
to separate the determinant (small weakly skew circuits) from polynomials with small formulas 
(small tree circuits). 

Overview. In §2 we describe the new equations to test for border rank in the language of 
tensors. In §3 we apply these equations to matrix multiplication. Theorems 1.3 and 1.1 are 
respectively proved in §4 and §5. We conclude in §6 with a review of Lickteig's method for 
purposes of comparison. 

Acknowledgments. We thank K. Mulmuley and A. Wigderson for discussions regarding the 
perspective of computer scientists, P. Biirgisser and A. Wigderson for help improving the ex- 
position, J. Hauenstein with help with computer calculations, and M. Blaser for help with the 
literature. 

2. The new equations 

Let A, B, C be complex vector spaces of dimensions a, b, c, with b < c, and with dual vector 
spaces A* , B* , C* . Then A®B®C may be thought of as the space of bilinear maps A* x B* —> C. 
We work in projective space as the objects we are interested in are invariant under rescaling. 

Let Seg(FA x FB x PC) C F(A®B®C) denote the Segre variety of rank one tensors and let 
a r (Seg(¥A x FB x PC)) denote its r-th secant variety, the variety of tensors of border rank at 
most r. 

The most naive equations for a r (Seg(¥A x Pi? x PC)) are the so-called flattenings. Given 
T G A(g>B®C , consider Tb ■ B* — > A®C as a linear map. Then R(T) > rank(Tg) and similarly 
for cyclic permutations of A, B, C. The rank of a linear map is determined by taking minors. 

In [7] we proposed a generalization of flattenings, called Young flattenings, which in the 
present context is as follows: Recall that irreducible polynomial representations of the gen- 
eral linear group GL(A) correspond to partitions tt = (jri, . . . ,vr a ). Let S n A denote the cor- 
responding GL (^-module. Consider representations S n A, S^B, S U C, and the identity maps 
Ids n A £ S^Ai^S^A* etc... Then we may consider 

We may decompose S n A®A according to the Pieri rule and project to one irreducible component, 
say S^A, where tt is obtained by adding a box to tt, and similarly for C, while for B we may 
decompose S fl B*(£iB and project to one irreducible component, say S/xB*, where fi is obtained 
by deleting a box from fi. The upshot is a tensor 

T' e S ji A®S^B®S i> C®S 7T A*®S fl B*®S iy C* 

which we may then consider as a linear map, e.g., 

T' : S V A®S I1 JB*®S V C -> S*A®S fl B*®Si>C 
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and rank conditions on T' often give border rank conditions on T. 

Strassen's equations [12] may be understood in this framework. As described in [10], tensor 
with Id a and project to obtain a map 

T4 1 : B*®A -> A 2 A®C. 

If T is generic, then one can show that T^ 1 will have maximal rank, and if T = a<g>b®c is of 
rank one, rank((a<8'6(8'c)^ 1 ) = a — 1. It follows that R(T) > ian ^j 4 - . Thus the best bound one 
could hope for with this technique is up to r = ^. The minors of order r (a — 1) + 1 of 
give equations for a r (Seg(FA x FB x PC)). This is most effective when a = 3. 

When a > 3, for each 3-plane A' C A, consider the restriction T\a*®b®c an d the corresponding 
equations, to obtain modules of equations for a r (Seg(FAxFB xPC)) for r < 4jr. This procedure 
is called inheritance. 

We consider the next simplest cases: 

(6) T* p : B*tg> A P A^ A P+1 A®C. 

To avoid redundancies, assume b < c and p < [|] — 1. Then, if T = a<S>6<g>c is of rank one, 

rank((a(g>6(g>c)^ p ) = 1 

To see this, expand a = a\ to a basis ai, . . . , o a of A with dual basis a , . . . , a a of A*. Then 
T^ p = [a 11 A ■ ■ ■ a tp ®b] ®[a\ A A ■ ■ ■ ®c], so the image is isomorphic to A P (A/ 'oi)®c. 

Remark 2.1. Alternatively, one can compute the rank using the vector bundle techniques of [7]. 

When T is generic, we expect T^ p to be injective, thus potentially obtaining modules of 
equations up to 

K) ba 



(V) *- p 

Since this is an increasing function of p, one gets the most equations taking p equal to its 
maximal value, p = [§] — 1, and again by inheritance, one potentially obtains new modules of 
equations up to roughly o'2h(Seg(¥'A x FB x PC)). A consequence of Theorem 1.1 is that this 
is indeed the case. For example, since our equations when a = n 2 are nontrivial up to at least 
2n 2 — n, we obtain: 

Corollary 2.2. Set a < b < c. Tiien the maps T^ p give nontrivial equations for o r (Seg(FA x 
FB x PC)) for r < 2a - y/a. 

We record the following proposition which follows from Stirling's formula and the discussion 
above. 

Proposition 2.3. The equations for a r (Seg(FA x FB x PC)) obtained by taking minors ofT^ p 
are of degree ?"( a-1 ) + 1- In particular, when r approaches the upper bound 2b and p = [|] — 1, 

2 2 a b 



the equations are asymptotically of degree ■ 
The equations are 

(7) A t+1 (A p A®B*)® A t+1 (A P+1 A*®C*) = S^A P A)®S^B*®S U (A P+1 A*)®S U ,C* 

\/i\=p+l, \v\=p 

and determining their precise module structure (i.e., which irreducible submodules of (7) actually 
contribute nontrivial equations) appears to be difficult. 
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Theorem 1.1 is obtained by applying the inheritance principle to the case of an (n + m — 1)- 
plane A' C A = C nm . 

3. Matrix multiplication 

Let M,N,L be vector spaces of dimensions m, n, 1. Write A = M®N*, B = N<g>L*, 
C = L®M* so a. — mn, b — nl, c — ml. The matrix multiplication operator M^^ n i> is 
-^<m,n,i> = IdM®IdN®IdL € A®B®C. Let U = N* . We compute the kernel of the map 

(^<m,n,i))7 : L®U®A P (M®U) -> L®M*® A p+1 (M^U). 

Assume b < c, so n < m. For a partition ir = (tti, . . . , ttn), let £(tt) denote the number of parts 
of 7r, i.e., the largest k such that ir^ > 0. Let ir' denote the conjugate partition to ir. 

Lemma 3.1. ker(M/ m>n ,1})^ = @-kS 1t+ ^U®S. k iM®L where the summation is over partitions 
it = (m, u\, . . . , v n -i) where v = (y\, . . . , z/ n -i) is a partition of p — m, z/i < m and n + (1) = 
(m+ 1,1/1,..., i/ n _i). 

Proof. Write M^ nl = ip p ®Id L , where tp p : f\ p (M®U)®U -)• M*<g) A p+1 (M®U). It is clear 
by Schur's lemma that such modules are contained in the kernel, as there is no corresponding 
module in the target for it to map to. 

We show that all other modules in N P (M®U)®U are not in the kernel by computing tp p at 
weight vectors. Set T' = Idjj®IdM, so ip p = (T')^ p . Write T' = (u l ®m a )®m a ®Ui, where 
l<i<n, l<a< m, (u % ) is the dual basis to (ui) and similarly for (m a ) and (m a ), and the 
summation convention is used throughout. Then 

T'®Id APA = (u i 0m a )®m a ®u i ®[(u jl ®m 131 ) A • • • A (u jp ®m^ )]®[(u jl ®m^) A • • • A (u jp ®mp p )] 
and 

(T')7 = [( Ujl ®m h ) A • • • A (u jp ®m l3p )]®u i 6?) m a ®[{u h ®mp 1 ) A • • • A (u j *®mp p ) A (u^mj] 



(J) c^+g^® • • ■ <g) Uj^Ui)®^ (m^ 1 ® •••<£> vnP v ) 

c„(u jl ® ■ ■ ■ <8> u^^u^Cy^m^® ■■■<S) mp p ®m a )®m a 
\\u\=p+\ 

Here the c r 's are Young symmetrizers, and if we write 

M = ((<7i) s \ ■ ■ ■ » (Qf) Sf ) = (Qi, ••-,91,92,-- - ,52, ••-,?/,-■ • ,9/), 

then e = (0, . . . , 0, 1, 0, . . . , 0) where the 1 can be in the slots 1, s± + l,...,Sf + 1, the last only if 
s± + • • • + Sf < n. Now thinking of ip p : A P A®U — > A p+l A®M* as a linear map and recalling 
Schur's lemma, it is clear that if [x + e = u and u' is of the form // + e' where e', similar to e, has 
a one in any slot where there is a jump in the partition (i.e., as allowed by the Pieri rule), then 
the map is the identity on the corresponding module, and otherwise the map is zero. There are 
corresponding modules except when the modules are as in the statement of the lemma. □ 

Remark 3.2. If we let B' = U , C = M, then in the proof above we are really just computing 
the rank of (T 1 )^ where T' G A®B'®C is Idjj®IdM ■ The maximal border rank of a tensor 
T in C mn <£>C m <g>C n is mn which occurs anytime the map T : C mn * — > C m (S>C n is injective, so 
T' is a generic tensor in A®B'®C , and the calculation of rank^ p is determining the maximal 
rank of for a generic element of C mn <8>C n (g>C m . 
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Example 3.3. Consider the case m = n = 3, take p = 4. Let 



a 2 



















,«3 = 







Note that a\ = ct' 3 . Then 

A 4 (M©iV*) = (S ai M®S a:i N*) © (S a2 M®S a2 N*) © (S as M®S ai N*) 

Observe that 



Among the seven summands on the right-hand side, only 



does not fit in the 3x3 



square. The kernel of in this case is L*®S^iM§§S2,i,\N* , corresponding to 



7T 



7T+(l) 



which has dimension 1 • 24 • 3 = 721. 
Hence the rank of Afg* « is 31 • Q - 721 = 3061 and R(M (3)3iI> ) > f- 3061 ~ 



coincides with Lickteig's bound of 14 when 1 = 3. 



(!) 



which 



4. Proof of formula (4) 

We compute the dimension of ker(M/ m>n ,i))^ p = kei tp p ®IdL via an exact sequence. We 
continue the notations of above. Consider the map 

(8) V P ,2 : A p - m (MW)© A m M®S m+1 U -»• A P (M®U)®U 

(9) T©mi A • • • A m m ©u m+1 4TA (mi®«) A • • • A(m m ©u)©w. 
Lemma 4.1. Image ^ Pi 2 = ker^p. 

Proof. Observe that 



S ll M®M*®S lJl >U 



| 7T |— p,£(7r) <n 



I m I — p+i i^Cp) ^ m 



is a GL(U) x GL(M)-module map. Now the source of ijj Pi 2 is 



© 



S^M^SM^S^U^ A m M 



\v\=y — m,v-[ <m 
«(f)<n 



and a given module in the source with v n = maps to S 7r+ (i)J7<g)S', r /M C S n lI<SiU<SiS^M where 
7r = (m, ui, . . . , k'n-i)) the proof is similar to the proof of Lemma 3.1. Its other components map 
to zero. □ 
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The kernel of t/v,2 is the image of 

(10) W,3 : A p - m - 1 (M®l7)® Am M®M®S m+2 U -> A p - m (iWW)® A m M®S m+1 ?7 

T077H A • • • A m m ®m®u m+2 H> T A (m®u)®mi A • • • A m m ®ii m+1 
and ^p,3 has kernel the image of 

(11) W,4 : A p - m - 2 (M®U)® A m M®S 2 M®S m+3 t/ -> A^-^iWW)® A m M®M®S m+2 U 

T®m\ A • • • A m m (g)m 2 (g)n m+3 H> T A (m®u)®mi A • • • A m m (g>m(g>u m+2 
One defines analogous maps Vv,fc- By taking the Euler characteristic we obtain: 
Lemma 4.2. 

dimker^=V(-l)^ mn U» + j-lWm + n+j 

The first part of Theorem 1.3 follows. The second part is proved by making an identification 
M ~ U and restricting to A' = S 2 U C U®U or A' = SqU, the traceless symmetric matrices. 
The second part corresponds to taking dim A' = q. 

5. Proof of Theorem 1.1 

The essential idea is to choose a subspace A' C M®U on which the "restriction" of i/j p 
becomes injective. Take a vector space W of dimension 2, and fix isomorphisms U ~ S* 11-1 !^*, 
M ~ S" 1 " 1 !^* . Let A' be the direct summand S m+n ~ 2 W* C S^W"*®^-:^* = M®U. 

Recall that S a W may be interpreted as the space of homogenous polynomials of degree a in 
two variables. If / € S a W and g G S^W* then we can perform the contraction g ■ f £ S a ^W. 
In the case f = l a is the power of a linear form Z, then the contraction g-l a equals l a ~@ multiplied 
by the value of g at the point Z, so that (for /3 < a) g ■ l a = if and only if Z is a root of g. 

Consider the natural skew-symmetrization map 

(12) A' ® A n_1 (A') — > A n (A'). 

Recall that representation theory distinguishes a complement A" to A, so the projection M®U — > 
A' is well defined. Compose (12) with the projection 

(13) M (&U ® A n ^ 1 (A') — >A' ® A n (A') 
to obtain 

(14) M (&U ® A n ~ 1 (A') — > A n (A'). 
Now (14) is equivalent to a map 

(15) ij}' p :U ® A 11 " 1 (A') — >M* ® A n (A'). 

We claim (15) is injective. (Note that when n = m the source and target space of (15) are dual 
to each other.) 

Consider the transposed map S^W*® A n S m+n ~ 2 W -> S n ~ l W® A 11 " 1 S m+n ~ 2 W. It is 
defined as follows on decomposable elements (and then extended by linearity): 



B 

i=l 

n-lo,//m+n-2 A « rm+n-2\ ,- nn-lTl/<> An— 1 



A • • • A /„) M- ^(-^-^(/i)®/! A •••/*••• A /„ 



We show this dual map is surjective. Let Z n " 1 ®(Z5 n+n -^ A • • • A Z™_ + i ) G A 
5 m+n_2 W with Zj G W. Such elements span the target so it will be sufficient to show any such 
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element is in the image. Assume first that I is distinct from the Zj. Since n < m, there is a 
polynomial g G S m ~ 1 W* which vanishes on Zi,...,Zn— l an d is nonzero on Z. Then, up to a 
nonzero scalar, fir®(Z™ +n ~ 2 A • • • A Z^ n-2 A / m + n ~ 2 ) maps to our element. 

Since the image is closed (being a linear space), the condition that I is distinct from the Zj 
may be removed by taking limits. 

Finally, ijj' p ®IdL is the map induced from the restricted matrix multiplication operator and we 
may repeat the arguments of §3. To complete the proof of Theorem 1.1, observe that an element 
of rank one in A'®B®C induces a map of rank ( n ^^i 2 ), So the rank of the multiplication 
operator must be at least 

dim L®U <g) A n ~ 1 (A') , O^-i^) nl(n + m-l) 
= i i 1 = 

/n+m— 2\ /n+m— 2\ 

v n— 1 / v n— 1 / 



6. Review of Lickteig's bound 

For comparison, we outline the proof of Lickteig's bound. (Expositions of Strassen's bound 
are given in several places, e.g. [6, Chap. 3] and [4, §19.3].) It follows in three steps. The first 
combines two standard facts from algebraic geometry: for varieties X,Y C PV, let J{X, Y) C 
PV denote the join of X and Y. Then <r r+ ,(X) = J{o r {X),a s (X)). If X = Seg{PA x PB x PC) 
is a Segre variety, then a s (Seg(PA xPBx PC)) C Sub s (A(g>B®C), where 

Sub s (A®B®C) = {Te A&B&C \ 

3A' C A, B' C B,C C C, dim A' = dim 5' = dimC' = s, T G A'®B'®C'}. 

See, e.g., [6] for details. (The proofs of these facts form the bulk of the paper.) Next Lickteig 
observes that if T G a r+s (Seg(PA x PB x PC)), then there exist A', B' , C each of dimension s 
such that, thinking of T : A*®B* ->■ C, 

(16) dim(T((A') ± ^* + A*®(B') L ) < r. 

This follows because the condition is a closed condition and it holds for points on the open 
subset of points in the span of r + s points on Seg(PA x PB x PC). 

Finally, for matrix multiplication, with A = M®N* etc., he defines M' C M, N*' C N* to 
be the smallest spaces such that A' C M'f^N*' and similarly for the other spaces. Then one 
applies (16) combined with the observation that M\r^r\±^g* C M'®L* etc., and keeps track of 
the various bounds to conclude. 
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